As a real international metropolitan, There is a total of 80 different cuisine types in NYC based on our dataset and we firstly plot the 10 most frequently shown cuisine types in NYC.
<<<<<<< HEAD ======= >>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7From the plot above, the top 10 cuisine types across NYC include
American, Chinese, Coffee/Tea,
Pizza, Bakery Products/Desserts,
Mexican, Japanese, Italian,
latin American, Caribbean, which implies the
racial diversity of NYC. American, Chinese,
Coffee/Tea are the top 3 favorite cuisine types for NY
citizens. We then inspect the top 10 frequently shown cuisine types in
different boroughs.
## `summarise()` has grouped output by 'boro'. You can override using the
## `.groups` argument.
<<<<<<< HEAD
The top 3 frequently shown cuisine types in Manhattan
and Brooklyn are American,
Coffee/Tea, and Chinese. In
Queens, Chinese, Latin American,
and American are the top 3 preferred cuisine types. This
make sense, since Queens has the largest Asian American population by
county outside the Western United States. In Bronx, they
are pizza, American, and Chinese.
In Staten Island, American,
Donuts, and Pizza are more liked by local
citizens.
The top 3 frequently cuisine types in Manhattan and
Brooklyn are American,
Coffee/Tea, and Chinese.
price_cuisine<-
inspection_raw %>%
select(dba,boro,cuisine_description,critical_flag,score,grade,grade_date,inspection_type,latitude,longitude,rating,review_num,price) %>%
drop_na(boro,price) %>%
mutate(price=as.factor(price)) %>%
group_by(price)%>%
count(cuisine_description) %>%
mutate(cuisine_description=fct_reorder(cuisine_description,n)) %>%
filter(min_rank(desc(n))<=10) %>%
ggplot(aes(x=cuisine_description,y=n,fill=price))+
geom_bar(position="dodge", stat="identity")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))+
theme(axis.text.x = element_text(size = 5))+
labs(
x="Cuisine type",
y="Number")+
facet_grid(~price)
ggplotly(price_cuisine)
<<<<<<< HEAD
In the price level $, the top 3 frequently shown cuisine
types are Chinese, Pizza, and
American. In the price level $$, the top 3
frequently shown cuisine types are American,
Chinese, and Coffee/Tea. In the price level
$$$, the top 3 cuisine types are American,
italian, and Chinese. In the price level
$$$$, the mainstreams become Japanese and
American. There are only few restaurants labeled as
$$$$ in our dataset, which kind of contradicts the
consensus that there are lots of fancy and expensive restaurants in NYC.
This might be due to the limits of our data source since our datasets
are selected and merged from restaurants that are under inspection or
have been in inspection and from restaurants that can be searched from
Yelp. Really expensive and gorgeous restaurants might not need
inspection and not be searched from Yelp.
price_boro<-
inspection_raw %>%
select(dba,boro,cuisine_description,critical_flag,score,grade,grade_date,inspection_type,latitude,longitude,rating,review_num,price) %>%
drop_na(boro,price) %>%
mutate(boro = fct_infreq(boro),
price=as.factor(price)) %>%
ggplot(aes(x = boro, fill = price)) +
geom_bar()
ggplotly(price_boro)
<<<<<<< HEAD
The above plot shows the number of restaurants at each price level in
different boroughs. The proportions of restaurants at price level
$$$ and $$$$ are so tiny among all the
boroughs. The main stream in Manhattan is $$,
while in the rest of boroughs.
review_num_rating<-
inspection_raw %>%
select(dba,boro,cuisine_description,critical_flag,score,grade,grade_date,inspection_type,latitude,longitude,rating,review_num,price) %>% drop_na(boro,price,rating) %>%
filter(!review_num<=100) %>%
group_by(rating) %>%
summarize(sum_review=sum(review_num)) %>%
ggplot(aes(y=sum_review,x=rating))+
geom_point()+
geom_smooth()+
labs(
x="Sum of review_numbers",
y="Rating"
)
ggplotly(review_num_rating)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
<<<<<<< HEAD
We are very curious about the relationship between rating and review numbers, so we make this plot to try to discover any latent association. As it showed above, the plot is seriously left-skewed. For most of rating lower than 3.0, the sum of review numbers are lower than 10000, which means for restaurants which gets low rates, the review numbers tend to be small. Meanwhile, it gives certain inspection about the model-building part, when using review numbers as a predictor.
======= >>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7inspection_raw %>%
filter(!is.na(price)) %>%
ggplot(aes(x = price, y = rating, fill = price), data = .)+geom_boxplot()+labs(title = "Yelp Rating vs. Price")

inspection_raw %>%
filter(!is.na(price)) %>%
ggplot(aes(y = rating, fill = price), data = .)+geom_density()+labs(title = "Density Plot")+facet_grid(~price)

There seems to be a positive relationship between the cost of dining and yelp rating of NYC restaurants. The distribution of restaurant review scores appears to be right-skewed, probably due to the presence of outliers with a much lower value compared to the majority of the data.
This map displays the geographical locations of the restaurants in the dataset. The map is interactive and is split on the costs of dining.
price_rating_map = inspection_raw %>%
mutate(text_label = str_c("Price: ", price, "\nRating: ", rating)) %>%
drop_na(price) %>%
plot_mapbox(
lat = ~latitude,
lon = ~longitude,
mode = "markers",
split = ~price,
mode = "markers",
hovertext = ~text_label) %>%
layout(
mapbox = list(
style = 'dark',
zoom =12.5,
center = list(lat = 40.71, lon = -73.98)))
price_rating_map %>% config(mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN"))
## Warning: Ignoring 8 observations
<<<<<<< HEAD
=======
>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7
In general, Manhattan has the largest number of restaurants and a much denser distribution compared to other boroughs. The proportion of the least and second least expensive restaurants are much higher compared to the proportions of more expensive dining places in Bronx, Queens, Brooklyn, and Staten Island. In addition, the majority of the restaurants that fall into the most expensive category are located in Manhattan.
price_rating_map = inspection_raw %>%
mutate(text_label = str_c("Cuisine", cuisine_description)) %>%
drop_na(price) %>%
plot_mapbox(
lat = ~latitude,
lon = ~longitude,
mode = "markers",
split = ~cuisine_description,
mode = "markers",
hovertext = ~text_label) %>%
layout(
mapbox = list(
style = 'dark',
zoom =12.5,
center = list(lat = 40.71, lon = -73.98)))
price_rating_map %>% config(mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN"))
<<<<<<< HEAD
=======
## Warning: Ignoring 8 observations
>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7